knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(durhamevp)

Introduction

This vignette documents how the data from crawling the BNA and WNO archive is stored on the database, how this data can be retrived and how it can be analysed.

The database stores three information from the crawling process in three tables:

  1. The table portal_archivesearch store information about the searches which have been run on the newspaper archives websites.
  2. The table portal_archivesearchresults stores information about the results returned by these searches
  3. The table portal_candidatedocument stores the candidate documents

Archive searches

The portal_archivesearch table can be retrived with the get_archivesearches() command.

archivesearches<-get_archivesearches()

knitr::kable(tail(archivesearches, 3))

Archive search results

The portal_archivesearchresults table can be retrived with the get_archivesearchresults() command. To retrieve the results relating to a particular search. For example to retrieve the results returned by the most recent document search (archive_search_id = r tail(archivesearches$id,1)):

lastid<-tail(archivesearches$id,1)
archivesearchresults<-get_archivesearchresults(archive_search_id=lastid)

names(archivesearchresults)

dim(archivesearchresults)

knitr::kable(head(archivesearchresults[,c("id", "archive_search_id", "description", "publication_title")], 3))


gidonc/durhamevp documentation built on April 8, 2022, 10:31 a.m.